{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "ba89be6e",
   "metadata": {},
   "source": [
    "## Mistral-7B-Instruct_GPTQ - Finetune on finance-alpaca dataset\n",
    "\n",
    "### Checkout my [Twitter(@rohanpaul_ai)](https://twitter.com/rohanpaul_ai) for daily LLM bits"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "60c514b4",
   "metadata": {},
   "source": [
    "<a href=\"https://colab.research.google.com/github/rohan-paul/LLM-FineTuning-Large-Language-Models/blob/main/Mistral_7B_Instruct_GPTQ_finetune.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "43537678",
   "metadata": {},
   "outputs": [],
   "source": [
    "# !pip install --upgrade trl peft accelerate bitsandbytes datasets auto-gptq optimum -q"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7dd0ba1d",
   "metadata": {},
   "outputs": [],
   "source": [
    "from accelerate import FullyShardedDataParallelPlugin, Accelerator\n",
    "from torch.distributed.fsdp.fully_sharded_data_parallel import FullOptimStateDictConfig, FullStateDictConfig\n",
    "import torch\n",
    "from transformers import AutoTokenizer, AutoModelForCausalLM, DataCollatorForLanguageModeling, BitsAndBytesConfig\n",
    "from datasets import load_dataset\n",
    "import pandas as pd\n",
    "import logging\n",
    "import os\n",
    "from pathlib import Path\n",
    "from typing import Optional, Tuple\n",
    "from peft import LoraConfig, PeftConfig, PeftModel\n",
    "from transformers import GPTQConfig\n",
    "from peft import prepare_model_for_kbit_training, LoraConfig, get_peft_model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "0fc11aa4-42bf-4082-ab1c-067c400e5ca0",
   "metadata": {
    "id": "TEzYBadkyRgd"
   },
   "outputs": [],
   "source": [
    "fsdp_plugin = FullyShardedDataParallelPlugin(\n",
    "    state_dict_config=FullStateDictConfig(offload_to_cpu=True, rank0_only=False),\n",
    "    optim_state_dict_config=FullOptimStateDictConfig(offload_to_cpu=True, rank0_only=False),\n",
    ")\n",
    "\n",
    "accelerator = Accelerator(fsdp_plugin=fsdp_plugin)\n",
    "\n",
    "dataset = load_dataset('gbharti/finance-alpaca')\n",
    "# Split the dataset into train and test sets\n",
    "train_test_split = dataset['train'].train_test_split(test_size=0.1)\n",
    "train_dataset = train_test_split['train']\n",
    "test_dataset = train_test_split['test']\n",
    "\n",
    "# Further split the train dataset into train and validation sets\n",
    "train_val_split = train_dataset.train_test_split(test_size=0.1)\n",
    "train_dataset = train_val_split['train']\n",
    "eval_dataset = train_val_split['test']\n",
    "\n",
    "\n",
    "\n",
    "##############\n",
    "\n",
    "pretrained_model_name_or_path = \"TheBloke/Mistral-7B-Instruct-v0.1-GPTQ\""
   ]
  },
  {
   "cell_type": "markdown",
   "id": "75b9604f",
   "metadata": {},
   "source": [
    "![](assets/2023-12-30-23-50-29.png)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "64e259ae",
   "metadata": {},
   "outputs": [],
   "source": [
    "def tokenize(prompt):\n",
    "    result = tokenizer(\n",
    "        prompt,\n",
    "        truncation=True,\n",
    "        max_length=512,\n",
    "        padding=\"max_length\",\n",
    "    )\n",
    "    result[\"labels\"] = result[\"input_ids\"].copy()\n",
    "    return result\n",
    "\n",
    "def format_input_data_to_build_model_prompt(data_point):\n",
    "        instruction = str(data_point['instruction'])\n",
    "        input_query = str(data_point['input'])\n",
    "        response = str(data_point['output'])\n",
    "\n",
    "        if len(input_query.strip()) == 0:\n",
    "            full_prompt_for_model = f\"\"\"Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.\\n\\n### Instruction:\\n{instruction} \\n\\n### Input:\\n{input_query}\\n\\n### Response:\\n{response}\"\"\"\n",
    "\n",
    "        else:\n",
    "            full_prompt_for_model = f\"\"\"Below is an instruction that describes a task. Write a response that appropriately completes the request.\\n\\n### Instruction:\\n{instruction}\\n\\n### Response:\\n{response}\"\"\"\n",
    "        return tokenize(full_prompt_for_model)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "34fd2826",
   "metadata": {},
   "source": [
    "## Need for input data formatting i.e. `format_input_data_to_build_model_prompt` method\n",
    "\n",
    "📌 The `format_input_data_to_build_model_prompt` method processes the input DataFrame, which contains columns like 'instruction', 'input', and 'output', representing different components of a training sample. The method consolidates these components into a single 'text' column, formatted in a structured way that aligns with the training requirements of LLMs.\n",
    "\n",
    "📌 Specifically, the method constructs each entry in the 'text' column as a concatenation of the instruction, the context (if provided), and the expected response. This formatting is key for fine-tuning models like LLMs that are based on transformer architectures. It ensures the correct associations between the prompts (instructions and input queries) and the expected responses.\n",
    "\n",
    "==============\n",
    "\n",
    "##  Prompt format for mistralai/Mixtral-8x7B-v0.1 🔥\n",
    "\n",
    "https://huggingface.co/mistralai/Mixtral-8x7B-v0.1/discussions/22\n",
    "\n",
    "\n",
    "\"Mixtral-8x7B-v0.1\" is a base model, therefore it doesn't need to be prompted in a specific way in order to get started with the model. If you want to use the instruct version of the model, you need to follow the template that is on the model card: https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1#instruction-format\n",
    "\n",
    "The template used to build a prompt for the Instruct model is defined as follows:\n",
    "\n",
    "```\n",
    "<s> [INST] Instruction [/INST] Model answer</s> [INST] Follow-up instruction [/INST]\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "de76e992",
   "metadata": {},
   "outputs": [],
   "source": [
    "def build_qlora_model(\n",
    "    pretrained_model_name_or_path: str = \"TheBloke/Mistral-7B-Instruct-v0.1-GPTQ\",\n",
    "    gradient_checkpointing: bool = True,\n",
    "    cache_dir: Optional[Path] = None,\n",
    ") -> Tuple[AutoModelForCausalLM, AutoTokenizer, PeftConfig]:\n",
    "    \"\"\"\n",
    "    Args:\n",
    "        pretrained_model_name_or_path (str): The name or path of the pretrained model to use.\n",
    "        gradient_checkpointing (bool): Whether to use gradient checkpointing or not.\n",
    "        cache_dir (Optional[Path]): The directory to cache the model in.\n",
    "\n",
    "    Returns:\n",
    "        Tuple[AutoModelForCausalLM, AutoTokenizer]: A tuple containing the built model and tokenizer.\n",
    "    \"\"\"\n",
    "\n",
    "    # If I am using any GPTQ model, then need to comment-out bnb_config\n",
    "    # as I can not quantize an already quantized model\n",
    "\n",
    "    # bnb_config = BitsAndBytesConfig(\n",
    "    #     load_in_4bit=True,\n",
    "    #     bnb_4bit_use_double_quant=True,\n",
    "    #     bnb_4bit_compute_dtype=torch.bfloat16\n",
    "    # )\n",
    "\n",
    "    # In below as well, when using any GPTQ model\n",
    "    # comment-out the quantization_config param\n",
    "\n",
    "    tokenizer = AutoTokenizer.from_pretrained(\n",
    "        pretrained_model_name_or_path,\n",
    "        padding_side=\"left\",\n",
    "        add_eos_token=True,\n",
    "        add_bos_token=True,\n",
    "    )\n",
    "    tokenizer.pad_token = tokenizer.eos_token\n",
    "\n",
    "    quantization_config_loading = GPTQConfig(bits=4, use_exllama=False, tokenizer=tokenizer)\n",
    "\n",
    "    model = AutoModelForCausalLM.from_pretrained(\n",
    "        pretrained_model_name_or_path,\n",
    "        # quantization_config=bnb_config,\n",
    "        quantization_config=quantization_config_loading,\n",
    "        device_map=\"auto\",\n",
    "        cache_dir=str(cache_dir) if cache_dir else None,\n",
    "    )\n",
    "\n",
    "    #disable tensor parallelism\n",
    "    model.config.pretraining_tp = 1\n",
    "\n",
    "    if gradient_checkpointing:\n",
    "        model.gradient_checkpointing_enable()\n",
    "        model.config.use_cache = (\n",
    "            False  # Gradient checkpointing is not compatible with caching.\n",
    "        )\n",
    "    else:\n",
    "        model.gradient_checkpointing_disable()\n",
    "        model.config.use_cache = True  # It is good practice to enable caching when using the model for inference.\n",
    "\n",
    "    return model, tokenizer"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "18d573c7",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "You passed `quantization_config` to `from_pretrained` but the model you're loading already has a `quantization_config` attribute and has already quantized weights. However, loading attributes (e.g. ['use_cuda_fp16', 'use_exllama', 'max_input_length', 'exllama_config', 'disable_exllama']) will be overwritten with the one you passed to `from_pretrained`. The rest will be ignored.\n"
     ]
    }
   ],
   "source": [
    "model, tokenizer = build_qlora_model(pretrained_model_name_or_path)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "f4f95dc0",
   "metadata": {},
   "outputs": [],
   "source": [
    "\n",
    "model = prepare_model_for_kbit_training(model)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "9c769d0b",
   "metadata": {},
   "outputs": [],
   "source": [
    "from peft import LoraConfig, get_peft_model\n",
    "\n",
    "config = LoraConfig(\n",
    "    r=8,\n",
    "    lora_alpha=16,\n",
    "    target_modules=[\n",
    "        \"q_proj\",\n",
    "        \"k_proj\",\n",
    "        \"v_proj\",\n",
    "        \"o_proj\"\n",
    "    ],\n",
    "    bias=\"none\",\n",
    "    lora_dropout=0.05,  # Conventional\n",
    "    task_type=\"CAUSAL_LM\",\n",
    ")\n",
    "\n",
    "model = get_peft_model(model, config)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "22d5b36e",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "trainable params: 6815744 || all params: 269225984 || trainable%: 2.5316070532033046\n"
     ]
    }
   ],
   "source": [
    "def print_trainable_parameters(model):\n",
    "    \"\"\"\n",
    "    Prints the number of trainable parameters in the model.\n",
    "    \"\"\"\n",
    "    trainable_params = 0\n",
    "    all_param = 0\n",
    "    for _, param in model.named_parameters():\n",
    "        all_param += param.numel()\n",
    "        if param.requires_grad:\n",
    "            trainable_params += param.numel()\n",
    "    print(\n",
    "        f\"trainable params: {trainable_params} || all params: {all_param} || trainable%: {100 * trainable_params / all_param}\"\n",
    "    )\n",
    "\n",
    "print_trainable_parameters(model)\n",
    "\n",
    "# trainable params: 6815744 || all params: 269225984 || trainable%: 2.5316070532033046\n",
    "\n",
    "# Apply the accelerator. You can comment this out to remove the accelerator.\n",
    "model = accelerator.prepare_model(model)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "86da2d8f",
   "metadata": {},
   "source": [
    "###########################3"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2667459f-fca0-480c-9934-f3b9a7b6361b",
   "metadata": {},
   "outputs": [],
   "source": [
    "tokenized_train_dataset = train_dataset.map(format_input_data_to_build_model_prompt)\n",
    "tokenized_val_dataset = eval_dataset.map(format_input_data_to_build_model_prompt)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ffe65c1a-62c2-4a36-99e3-3e972478261a",
   "metadata": {
    "id": "Vxbl4ACsyRgi"
   },
   "source": [
    "### Let's grab a single data point from our testset (both instruction and output) to see how the base model does on it."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "24f21bb8-e2df-4d76-bec4-bcbe966310c8",
   "metadata": {
    "id": "k_VRZDh9yRgi",
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Instruction Sentence: Describe how a person's life might be different if he/she won the lottery.\n",
      "Output: The person could easily afford their desired lifestyle, from buying luxury cars and homes to traveling the world and not having to worry about financial concerns. They could pursue their dream career or start a business or charity of their own, leaving them with a much more fulfilling life. They could give back to their communities and make a positive difference. They can use their wealth to make a lasting impact in the lives of family and friends. All in all, winning the lottery can drastically change a person's life for the better.\n",
      "\n"
     ]
    }
   ],
   "source": [
    "print(\"Instruction Sentence: \" + test_dataset[1]['instruction'])\n",
    "print(\"Output: \" + test_dataset[1]['output'] + \"\\n\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "f5020387-45b8-4a77-a63e-15cf6b1d8d5a",
   "metadata": {
    "id": "gOxnx-cAyRgi"
   },
   "outputs": [],
   "source": [
    "eval_prompt = \"\"\"Given an instruction sentence construct the output.\n",
    "\n",
    "### Instruction sentence:\n",
    "Generate a sentence that describes the main idea behind a stock market crash.\n",
    "\n",
    "\n",
    "### Output\n",
    "\n",
    "\n",
    "\"\"\""
   ]
  },
  {
   "cell_type": "markdown",
   "id": "889f57b0",
   "metadata": {},
   "source": [
    "Now, to start our fine-tuning, we have to apply some preprocessing to the model to prepare it for training. For that use the `prepare_model_for_kbit_training` method from PEFT."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "d9866d9f-0578-4a61-8b13-f100a1a344ab",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Apply the accelerator. You can comment this out to remove the accelerator.\n",
    "# prepare_model - Prepares a PyTorch model for training in any distributed setup.\n",
    "model = accelerator.prepare_model(model)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "b54d3b8e-88a6-4fbd-9375-509ea9a296af",
   "metadata": {
    "id": "NidIuFXMyRgi"
   },
   "outputs": [],
   "source": [
    "# Re-init the tokenizer so it doesn't add padding or eos token\n",
    "eval_tokenizer = AutoTokenizer.from_pretrained(\n",
    "    pretrained_model_name_or_path,\n",
    "    add_bos_token=True,\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "id": "93a253a4-a3a8-43b3-abb2-d602d8fa2ab0",
   "metadata": {},
   "outputs": [],
   "source": [
    "device = \"cuda\"\n",
    "model_input = eval_tokenizer(eval_prompt, return_tensors=\"pt\").to(device)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "fb6f9452-0016-48f7-b355-588907eaff14",
   "metadata": {},
   "outputs": [],
   "source": [
    "model.eval()\n",
    "with torch.no_grad():\n",
    "    print(eval_tokenizer.decode(model.generate(**model_input, max_new_tokens=128)[0], skip_special_tokens=True))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f6bc6c58-8338-4d5d-a8c0-05dfd7162423",
   "metadata": {
    "id": "dCAWeCzZyRgi"
   },
   "source": [
    "It actually did not do very well out of the box."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4f088a21-62b6-46e6-9323-2aa583754f4b",
   "metadata": {
    "id": "cUYEpEK-yRgj"
   },
   "source": [
    "Let's print the model to examine its layers, as we will apply QLoRA to all the linear layers of the model. Those layers are `q_proj`, `k_proj`, `v_proj`, `o_proj`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "id": "5e477004-dbdb-4feb-82a0-681289522fdf",
   "metadata": {
    "id": "XshGNsbxyRgj",
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "PeftModelForCausalLM(\n",
      "  (base_model): LoraModel(\n",
      "    (model): MistralForCausalLM(\n",
      "      (model): MistralModel(\n",
      "        (embed_tokens): Embedding(32000, 4096, padding_idx=0)\n",
      "        (layers): ModuleList(\n",
      "          (0-31): 32 x MistralDecoderLayer(\n",
      "            (self_attn): MistralAttention(\n",
      "              (rotary_emb): MistralRotaryEmbedding()\n",
      "              (k_proj): QuantLinear(\n",
      "                (base_layer): QuantLinear()\n",
      "                (lora_dropout): ModuleDict(\n",
      "                  (default): Dropout(p=0.05, inplace=False)\n",
      "                )\n",
      "                (lora_A): ModuleDict(\n",
      "                  (default): Linear(in_features=4096, out_features=8, bias=False)\n",
      "                )\n",
      "                (lora_B): ModuleDict(\n",
      "                  (default): Linear(in_features=8, out_features=1024, bias=False)\n",
      "                )\n",
      "                (lora_embedding_A): ParameterDict()\n",
      "                (lora_embedding_B): ParameterDict()\n",
      "                (quant_linear_module): QuantLinear()\n",
      "              )\n",
      "              (o_proj): QuantLinear(\n",
      "                (base_layer): QuantLinear()\n",
      "                (lora_dropout): ModuleDict(\n",
      "                  (default): Dropout(p=0.05, inplace=False)\n",
      "                )\n",
      "                (lora_A): ModuleDict(\n",
      "                  (default): Linear(in_features=4096, out_features=8, bias=False)\n",
      "                )\n",
      "                (lora_B): ModuleDict(\n",
      "                  (default): Linear(in_features=8, out_features=4096, bias=False)\n",
      "                )\n",
      "                (lora_embedding_A): ParameterDict()\n",
      "                (lora_embedding_B): ParameterDict()\n",
      "                (quant_linear_module): QuantLinear()\n",
      "              )\n",
      "              (q_proj): QuantLinear(\n",
      "                (base_layer): QuantLinear()\n",
      "                (lora_dropout): ModuleDict(\n",
      "                  (default): Dropout(p=0.05, inplace=False)\n",
      "                )\n",
      "                (lora_A): ModuleDict(\n",
      "                  (default): Linear(in_features=4096, out_features=8, bias=False)\n",
      "                )\n",
      "                (lora_B): ModuleDict(\n",
      "                  (default): Linear(in_features=8, out_features=4096, bias=False)\n",
      "                )\n",
      "                (lora_embedding_A): ParameterDict()\n",
      "                (lora_embedding_B): ParameterDict()\n",
      "                (quant_linear_module): QuantLinear()\n",
      "              )\n",
      "              (v_proj): QuantLinear(\n",
      "                (base_layer): QuantLinear()\n",
      "                (lora_dropout): ModuleDict(\n",
      "                  (default): Dropout(p=0.05, inplace=False)\n",
      "                )\n",
      "                (lora_A): ModuleDict(\n",
      "                  (default): Linear(in_features=4096, out_features=8, bias=False)\n",
      "                )\n",
      "                (lora_B): ModuleDict(\n",
      "                  (default): Linear(in_features=8, out_features=1024, bias=False)\n",
      "                )\n",
      "                (lora_embedding_A): ParameterDict()\n",
      "                (lora_embedding_B): ParameterDict()\n",
      "                (quant_linear_module): QuantLinear()\n",
      "              )\n",
      "            )\n",
      "            (mlp): MistralMLP(\n",
      "              (act_fn): SiLU()\n",
      "              (down_proj): QuantLinear()\n",
      "              (gate_proj): QuantLinear()\n",
      "              (up_proj): QuantLinear()\n",
      "            )\n",
      "            (input_layernorm): MistralRMSNorm()\n",
      "            (post_attention_layernorm): MistralRMSNorm()\n",
      "          )\n",
      "        )\n",
      "        (norm): MistralRMSNorm()\n",
      "      )\n",
      "      (lm_head): Linear(in_features=4096, out_features=32000, bias=False)\n",
      "    )\n",
      "  )\n",
      ")\n"
     ]
    }
   ],
   "source": [
    "print(model)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "630857be-801a-4586-a562-88ffc7772058",
   "metadata": {
    "id": "I6mTLuQJyRgj"
   },
   "source": [
    "\n",
    "📌 `LoraConfig` allows you to control how LoRA is applied to the base model:\n",
    "\n",
    "📌 Rank of Decomposition r\n",
    "\n",
    "r represents the rank of the low rank matrices learned during the finetuning process. As this value is increased, the number of parameters needed to be updated during the low-rank adaptation increases. Intuitively, a lower r may lead to a quicker, less computationally intensive training process, but may affect the quality of the model thus produced. However, increasing r beyond a certain value may not yield any discernible increase in quality of model output.\n",
    "\n",
    "---------------\n",
    "\n",
    "`target_modules` are the names of modules LoRA is applied to. Here it is set to query, key and value which are the names of inner layers of self attention layer from Transformer Architecture.\n",
    "\n",
    "---------------\n",
    "\n",
    "📌 Alpha Parameter for LoRA Scaling `lora_alpha`\n",
    "\n",
    "According to the LoRA article Hu et. al., ∆W is scaled by α / r where α is a constant. When optimizing with Adam, tuning α is roughly the same as tuning the learning rate if the initialization was scaled appropriately. The reason is that the number of parameters increases linearly with r. As you increases r, the values of the entries in ∆W also scale linearly with r. We want ∆W to scale consistently with the pretrained weights no matter what r is used. That’s why the authors set α to the first r and do not tune it. The default of α is 8.\n",
    "\n",
    "---------\n",
    "\n",
    "📌 `Dropout Rate (lora_dropout)`: This is the probability that each neuron’s output is set to zero during training, used to prevent overfitting.\n",
    "\n",
    "So Dropout is a general technique in Deep Learning, to reduce overfitting by randomly selecting neurons to ignore with a dropout probability during training. The contribution of those selected neurons to the activation of downstream neurons is temporally removed on the forward pass, and any weight updates are not applied to the neuron on the backward pass. The default of lora_dropout is 0."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9eb4c7fe-c05a-479a-9208-84d86d22c0bf",
   "metadata": {
    "id": "_0MOtwf3zdZp"
   },
   "source": [
    "### Training!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "id": "a9d0aae4-60e2-4853-a949-1d2026c66e98",
   "metadata": {
    "id": "c_L1131GyRgo"
   },
   "outputs": [],
   "source": [
    "if torch.cuda.device_count() > 1: # If more than 1 GPU\n",
    "    model.is_parallelizable = True\n",
    "    model.model_parallel = True"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "id": "edbb95a0-9f0f-465a-9657-506596615afb",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1"
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "torch.cuda.device_count()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "832143f1-35a3-454c-82f9-42f195a03c8f",
   "metadata": {
    "id": "jq0nX33BmfaC",
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "import transformers\n",
    "from datetime import datetime\n",
    "\n",
    "project = \"Mixtral-alpaca-finance-finetune\"\n",
    "base_model_name = \"mixtral\"\n",
    "run_name = base_model_name + \"-\" + project\n",
    "output_dir = \"./\" + run_name\n",
    "\n",
    "tokenizer.pad_token = tokenizer.eos_token\n",
    "\n",
    "trainer = transformers.Trainer(\n",
    "    model=model,\n",
    "    train_dataset=tokenized_train_dataset,\n",
    "    eval_dataset=tokenized_val_dataset,\n",
    "    args=transformers.TrainingArguments(\n",
    "        output_dir=output_dir,\n",
    "        warmup_steps=5,\n",
    "        per_device_train_batch_size=1,\n",
    "        gradient_checkpointing=True,\n",
    "        gradient_accumulation_steps=4,\n",
    "        max_steps=1000,\n",
    "        learning_rate=2.5e-5,\n",
    "        logging_steps=25,\n",
    "        fp16=True,\n",
    "        optim=\"paged_adamw_8bit\",\n",
    "        logging_dir=\"./logs\",        # Directory for storing logs\n",
    "        save_strategy=\"steps\",       # Save the model checkpoint every logging step\n",
    "        save_steps=50,                # Save checkpoints every 50 steps\n",
    "        evaluation_strategy=\"steps\", # Evaluate the model every logging step\n",
    "        eval_steps=50,               # Evaluate and save checkpoints every 50 steps\n",
    "        do_eval=True,                # Perform evaluation at the end of training\n",
    "        # report_to=\"wandb\",           # Comment this out if you don't want to use weights & baises\n",
    "        run_name=f\"{run_name}-{datetime.now().strftime('%Y-%m-%d-%H-%M')}\"          # Name of the W&B run (optional)\n",
    "    ),\n",
    "    data_collator=transformers.DataCollatorForLanguageModeling(tokenizer, mlm=False),\n",
    ")\n",
    "\n",
    "model.config.use_cache = False  # silence the warnings. Re-enable for inference!\n",
    "trainer.train()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "24beb8e2-1ea7-4c30-a8cc-37ff6b6e62b0",
   "metadata": {
    "id": "0D57XqcsyRgo"
   },
   "source": [
    "### Evaluate the Trained Model!\n",
    "\n",
    "However, before going to the evaluation code, it's a good idea to kill the current process so that to avoid possible out of memory loading the base model again on top of the model we just trained. \n",
    "\n",
    "Hence, to kill the current process => Go to `Kernel > Restart Kernel` or kill the process via the Terminal (`nvidia smi` > `kill [PID]`). \n",
    "\n",
    "### By default, the PEFT library will only save the QLoRA adapters, so we need to first load the base Mixtral model from the Huggingface Hub:\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "538a7d9f-71f1-4b1e-bf96-e232ad302180",
   "metadata": {
    "id": "SKSnF016yRgp"
   },
   "outputs": [],
   "source": [
    "import torch\n",
    "from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig\n",
    "\n",
    "pretrained_model_name_or_path = \"TheBloke/Mistral-7B-Instruct-v0.1-GPTQ\"\n",
    "\n",
    "# bnb_config = BitsAndBytesConfig(\n",
    "#     load_in_4bit=True,\n",
    "#     bnb_4bit_use_double_quant=True,\n",
    "#     bnb_4bit_compute_dtype=torch.bfloat16\n",
    "# )\n",
    "\n",
    "base_model = AutoModelForCausalLM.from_pretrained(\n",
    "    pretrained_model_name_or_path,  # Mixtral, same as before\n",
    "    # quantization_config=bnb_config,  # Same quantization config as before, but commented out as its a GPTQ model (which is already quantized )\n",
    "    quantization_config=quantization_config_loading,\n",
    "    device_map=\"auto\",\n",
    "    trust_remote_code=True,\n",
    ")\n",
    "\n",
    "eval_tokenizer = AutoTokenizer.from_pretrained(\n",
    "    pretrained_model_name_or_path,\n",
    "    add_bos_token=True,\n",
    "    trust_remote_code=True,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f4f4580e-1d9d-4c8d-b8d1-05cc7dee3bcf",
   "metadata": {
    "id": "_BxOhAiqyRgp"
   },
   "source": [
    "### Noting again, by default, the PEFT library will only save the QLoRA adapters, so we need to first load the base Mixtral model from the Huggingface Hub:\n",
    "\n",
    "Now load the QLoRA adapter from the appropriate checkpoint directory, i.e. the best performing model checkpoint:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d6bdefc4-8b5b-4c16-82ff-ae54b70a50b4",
   "metadata": {
    "id": "GwsiqhWuyRgp"
   },
   "outputs": [],
   "source": [
    "from peft import PeftModel\n",
    "\n",
    "ft_model = PeftModel.from_pretrained(base_model, \"mistral-finetune-alpaca-GPTQ/checkpoint-500\")\n",
    "\n",
    "# Here, \"mistral-finetune-alpaca-GPTQ/checkpoint-500\" is the adapter name"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "332c2771-3e84-405a-a780-1392bc6b737f",
   "metadata": {
    "id": "lX39ibolyRgp"
   },
   "source": [
    "and run your inference!"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3f99ff63-728e-4cd7-a5a2-4a3580b00f84",
   "metadata": {
    "id": "UUehsaVNyRgp"
   },
   "source": [
    "Let's try the same `eval_prompt` and thus `model_input` as above, and see if the new finetuned model performs better."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "240eaf08-96f9-434c-8d3a-a77939eaeab8",
   "metadata": {
    "id": "lMkVNEUvyRgp"
   },
   "outputs": [],
   "source": [
    "eval_prompt = \"\"\"\"Given an instruction sentence construct the output.\n",
    "\n",
    "### Instruction sentence:\n",
    "Generate a sentence that describes the main idea behind a stock market crash.\n",
    "\n",
    "\n",
    "### Output\n",
    "\n",
    "\n",
    "\"\"\"\n",
    "\n",
    "model_input = eval_tokenizer(eval_prompt, return_tensors=\"pt\").to(\"cuda\")\n",
    "\n",
    "ft_model.eval()\n",
    "with torch.no_grad():\n",
    "    print(eval_tokenizer.decode(ft_model.generate(**model_input, max_new_tokens=50)[0], skip_special_tokens=True))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1184ef2f",
   "metadata": {},
   "source": [
    "## `PeftModel.from_pretrained` - Explanations\n",
    "\n",
    "https://huggingface.co/docs/peft/package_reference/peft_model#peft.PeftModel.from_pretrained\n",
    "\n",
    "\n",
    "When I do the below line\n",
    "\n",
    "`ft_model = PeftModel.from_pretrained(base_model, model_id)`\n",
    "\n",
    "\n",
    "--------------------\n",
    "\n",
    "### Source Code\n",
    "\n",
    "https://github.com/huggingface/peft/blob/v0.7.1/src/peft/peft_model.py#L282\n",
    "\n",
    "```\n",
    "def from_pretrained(\n",
    "        cls,\n",
    "        model: torch.nn.Module,\n",
    "        model_id: Union[str, os.PathLike],\n",
    "        adapter_name: str = \"default\",\n",
    "        is_trainable: bool = False,\n",
    "        config: Optional[PeftConfig] = None,\n",
    "        **kwargs: Any,\n",
    "    ) -> \"PeftModel\":\n",
    "        r\"\"\"\n",
    "        Instantiate a PEFT model from a pretrained model and loaded PEFT weights.\n",
    "\n",
    "        Note that the passed `model` may be modified inplace.\n",
    "\n",
    "        Args:\n",
    "            model ([`torch.nn.Module`]):\n",
    "                The model to be adapted. For 🤗 Transformers models, the model should be initialized with the\n",
    "                [`~transformers.PreTrainedModel.from_pretrained`].\n",
    "            model_id (`str` or `os.PathLike`):\n",
    "                The name of the PEFT configuration to use. Can be either:\n",
    "                    - A string, the `model id` of a PEFT configuration hosted inside a model repo on the Hugging Face\n",
    "                      Hub.\n",
    "                    - A path to a directory containing a PEFT configuration file saved using the `save_pretrained`\n",
    "                      method (`./my_peft_config_directory/`).\n",
    "            adapter_name (`str`, *optional*, defaults to `\"default\"`):\n",
    "                The name of the adapter to be loaded. This is useful for loading multiple adapters.\n",
    "\n",
    "\n",
    "```\n",
    "\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.13"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
